Maximum Sustainable Throughput Prediction for Large-Scale Data Streaming Systems

ثبت نشده
چکیده

In cloud-based stream processing services, the maximum sustainable throughput (MST) is defined as the maximum throughput that a system composed of a fixed number of virtual machines (VMs) can ingest indefinitely. If the incoming data rate exceeds the system’s MST, unprocessed data accumulates, eventually making the system inoperable. Thus, it is important for the service provider to keep the MST always larger than the incoming data rate by allocating a sufficient number of VMs. In this paper, we propose a cost-effective framework to predict MST values for a given number of VMs for stream processing applications with various scalability characteristics. Since it may be difficult to find one prediction model that works well for various stream processing applications, we first train several models using linear regression for each application. We then select the best-fitting model for the target application through the evaluation of extra MST samples. To save cost and time to collect MST samples while achieving high prediction accuracy, we statistically determine the most effective set of VMs within a budget. For evaluation, we use Intel’s Storm benchmarks running on Amazon EC2 cloud. Using up to 128 VMs, experiments show that the models trained by our framework predict MST values with up to 15.8% average prediction error. Further, we evaluate our prediction models with simulation-based elastic VM scheduling for a realistic data streaming workload. Simulation results show that with 20% over-provisioning, our framework is able to achieve less than 0.1% SLA violations for the majority of test applications. We save 36% cost compared to a static VM scheduling that covers the peak workload to achieve the same level of SLA violations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DDA: Cross-Session Throughput Prediction with Applications to Video Bitrate Selection

User experience of video streaming could be greatly improved by selecting a high-yet-sustainable initial video bitrate, and it is therefore critical to accurately predict throughput before a video session starts. Inspired by previous studies that show similarity among throughput of similar sessions (e.g., those sharing same bottleneck link), we argue for a cross-session prediction approach, whe...

متن کامل

Simple Analytic Performance Models for Streaming Data Applications Deployed on Diverse Architectures

Modern hardware is inherently heterogeneous. With heterogeneity comes multiple abstraction layers that hide underlying complex systems. While hidden, this complexity makes quantitative performance modeling a difficult task. Designers of high-performance streaming applications for heterogeneous systems must contend with unpredictable and often non-generalizable models to predict performance of a...

متن کامل

Analyzing TCP Throughput Stability and Predictability with Implications for Adaptive Video Streaming

Recent work suggests that TCP throughput stability and predictability within a video viewing session can inform the design of better video bitrate adaptation algorithms. Despite a rich tradition of Internet measurement, however, our understanding of throughput stability and predictability is quite limited. To bridge this gap, we present a measurement study of throughput stability using a large-...

متن کامل

SIFt: A Compiler for Streaming Applications

Due to the increasing popularity of multimedia content and wireless computing, streaming applications have become an important part of modern computing workloads. Several hardware architectures that are geared towards such applications have been proposed, but compiler support for streams has not kept pace. This thesis presents an intermediate format and set of compilation techniques for a strea...

متن کامل

Modeling and Simulating Apache Spark Streaming Applications

Stream processing systems are used to analyze big data streams with low latency. The performance in terms of response time and throughput is crucial to ensure all arriving data are processed in time. This depends on various factors such as the complexity of used algorithms and configurations of such distributed systems and applications. To ensure a desired system behavior, performance evaluatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017